Message Length Estimators, Probabilistic Sampling and Optimal Prediction

نویسندگان

  • Ian Davidson
  • Ke Yin
چکیده

The Rissanen (MDL) and Wallace (MML) formulations of learning by compact encoding only provide a decision criterion to choose between two or more models, they do not provide any guidance on how to search through the model space. Typically, deterministic search techniques such as the expectation maximization (EM) algorithm have been used extensively with the MML/MDL principles to find the single shortest model. However, the probabilistic nature of the MML and MDL approaches makes Markov chain Monte Carlo (MCMC) sampling readily applicable. Sampling involves creating a stochastic process that visits each model in the model space with a chance equal to its posterior probability and has many benefits. We show that for MML estimators using mixture modeling that sampling can find shorter models than deterministic EM search. Samplers can be used to perform optimal Bayesian prediction (OBP), also known as Bayesian model averaging which involves making predictions by calculating the expectation of the predictor with respect to the posterior over all models. We show that for prediction, OBP can outperform even the shortest model and discuss the implications of basing predictions from a collection of models rather than the shortest model. Furthermore, since MML/MDL effectively discretizes the parameter space attaching probability estimates to each region this makes possible sampling across model spaces of varying dimension/complexity. Introduction The process of inductive learning essentially abstracts, generalizes or compresses the data into a model from which predictions of the future can be made. This was first formally noted by Solomonoff [1] and Chaitin [2] but it was not until the Rissanen (MDL) [3] and Wallace (MML) [4] formulations of learning by compact encoding using Shannon’s information theory that a computable approach became available. However, the MML and MDL approaches only provide a decision criterion to choose between two or more models, they do not provide any guidance on how to search through the collection of possible models in the model space. Though the complexity oriented Levin’s optimal universal search [5] approach for classes of inversion problems exists, its application for probabilistically formulated MDL/MML problems seems difficult. Typically deterministic search techniques such as the expectation maximization (EM) algorithm have been used extensively [6] with the MML/MDL principles to find the single best model that results in the shortest total encoding of model and data given the model. However, the Bayesian nature of the MML and MDL approaches means approaches in the field of Markov chain Monte Carlo (MCMC) sampling are readily applicable. Sampling involves creating a stochastic process so as to visit each model with a chance equal to its posterior probability and has several benefits over trying to converge to the best model. Furthermore, MML/MDL effectively discretizes the parameter space attaching probability estimates to each region making possible sampling across model spaces of varying dimension/complexity. We show that for MML estimators using mixture modeling that sampling can outperform deterministic EM search and can be used to perform optimal 1 P(θ).P(D|θ) = 2θθ when the lengths are measured in bits. Bayesian prediction (OBP), also known as Bayesian model averaging, that outperforms even the best model. We briefly discuss the implications of using OBP instead of basing predictions from the best model. MML Estimators MDL/MML inference involves constructing a two-part string to be transmitted between a sender and receiver: the model or theory of the observations and the observations encoded with respect to the model. The best model has the shortest total (sum of both parts) message length [7]. A particularly desirable property of the principle is that it discretizes a continuous parameter space into regions attaching a probability estimate to each. This enables comparing models of different complexity, such as a three class and five class clustering model, as we have converted both models to the same units of measure, bits of information. Techniques such as maximum likelihood estimation compute probability densities making comparisons of models with different complexities analogous to comparing models whose goodness is measured in different units. The the various formulations of the MML principle are effectively different ways to calculate the dimensions of each region. For example, the 1968 MML Gaussian formulation sub-optimally solved for the height and width separately to obtain instances of number : deviation, standard sample : s , ) 1 ( 6 , 12 _ _ _ N N s height N s width − = = σ μ . Later formulations of MML and MDL make use of the Fisher information to solve for all the region dimensions simultaneously. The MML formulation being n. informatio Fisher expected the is ) ( , ) ( 12 θ θ θ F F AOPV = Each region has a representative model that given the data is indistinguishable from all other models in the region. The MML estimate for a given induction problem is the representative model for the most probable region. Some highly probable regions for the simple univariate case are shown in Figure 1.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Minimum Encoding Approaches for Predictive Modeling

We analyze differences between two information-theoretically motivated ap­ proaches to statistical inference and model selection: the Minimum Description Length (MDL) principle, and the Minimum Message Length (MML) principle. Based on this analysis, we present two revised versions of MML: a pointwise estimator which gives the MML-optimal single parameter model, and a volumewise estimator which ...

متن کامل

Minimum Encoding Approaches for Predictive

We analyze diierences between two information-theoretically motivated approaches to statistical inference and model selection: the Minimum Description Length (MDL) principle, and the Minimum Message Length (MML) principle. Based on this analysis, we present two revised versions of MML: a pointwise estimator which gives the MML-optimal single parameter model, and a volumewise estimator which giv...

متن کامل

Count-Based Frequency Estimation with Bounded Memory

Count-based estimators are a fundamental building block of a number of powerful sequential prediction algorithms, including Context Tree Weighting and Prediction by Partial Matching. Keeping exact counts, however, typically results in a high memory overhead. In particular, when dealing with large alphabets the memory requirements of count-based estimators often become prohibitive. In this paper...

متن کامل

Energy-Aware Probabilistic Epidemic Forwarding Method in Heterogeneous Delay Tolerant Networks

Due to the increasing use of wireless communications, infrastructure-less networks such as Delay Tolerant Networks (DTNs) should be highly considered. DTN is most suitable where there is an intermittent connection between communicating nodes such as wireless mobile ad hoc network nodes. In general, a message sending node in DTN copies the message and transmits it to nodes which it encounters. A...

متن کامل

Minimum Message Length Inference and Parameter Estimation of Autoregressive and Moving Average Models

This technical report presents a formulation of the parameter estimation and model selection problem for Autoregressive (AR) and Moving Average (MA) models in the Minimum Message Length (MML) framework. In particular, it examines suitable priors for both classes of models, and subsequently derives message length expressions based on the MML87 approximation. Empirical results demonstrate the new...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2003